author: Jorge Cimentada and Basilio Moreno date: 6th of July 2019 class: section font-family: ‘Helvetica’ width: 1800 height: 900
{r, echo = F} set.seed(2131)
Alright, so far we have seen vectors, matrices and data frames.
x <- sample(1:10)
x
We have 10 random numbers.
Their positions are:
{r, echo = F} setNames(x, 1:10)
If x is: {r, echo = F} x
what is the result of: ```{r, eval = F} x[c(1, 3, 8)] #Watch out for square brackets.
x[c(-1, -5)]
x[seq(1, 8, 2)]
x[NA]
x[]
Write it down without running it!
Subsetting in R
========================================================
Do these subsetting rules apply the same for all types of vectors?
```{r}
char <- letters[1:10]
lgl <- c(TRUE, FALSE, TRUE, TRUE, TRUE, FALSE, FALSE)
gender <- factor(sample(c("female", "male"), 10, replace = T))
What about these ones?
{r, eval = F} char[c(1, 1, 1)] lgl[c(TRUE, 5, 1)] gender[c(1:3, TRUE)]
Super test: {r, eval = F} super_vector <- c(char, gender, lgl) super_vector[c(1, 11, 27)]
Subsetting rules are the same for all types of vectors.
Exceptions are:
Let’s go through each one…
incremental: true If you remember correctly, matrices are a vector with rows rows and columns.
x_matrix <- matrix(1:10, 5, 2) # 5 rows and 2 columns
x_matrix
Building on the previous examples, what wouldl be the result of this? {r, eval = F} x_matrix[c(1, 4, 6)]
To confuse you even more, what do you think would be the result of this? {r, eval = F} x_matrix[2:3, ]
A matrix can be thought of as two things:
* Or a numeric vector with rows and columns
```{r, echo = F}
x_matrix
Now that you know.. what are the results of: ```{r, eval = F} x_matrix[1:5, 2]
x_matrix[, 2]
x_matrix[1, 1]
x_matrix[1:10, 2]
x_matrix[, 1:2]
Subsetting in R
========================================================
Now, data frame are very similar to matrices.
```{r, echo = F}
set.seed(21)
our_df <- data.frame(letters = letters[1:10], age = sample(25:50, 10),
lgl = sample(c(TRUE, FALSE), 10, replace = T))
our_df
The same way matrices are subsetted!
```{r, eval = F} # First 3 rows for all columns our_df[1:3, ]
our_df[c(1, 8), 1:2]
our_df[c(5, 5, 5), 3]
What? Why is the last one a vector?
Subsetting in R
========================================================
So far we saw how to subset the same way we subset matrices.
* Data frames are lists, remember?
* They also have similar subsetting rules to lists.
```{r, eval = F}
# We lose the data frame dimensions using this method.
our_df[["age"]]
# We get a data frame with this one.
our_df["age"]
# We don't get a data frame here.
our_df$age
Following the ‘list’ subsetting rules for data frames:
The result should be: {r, echo = F} our_df$age[c(3, 4, 9)]
Well, now that we’re at it… How does it work for lists?
our_list <- list(data = our_df, x_matrix, gnd = gender)
Explanation
{r, eval = F} ourlist
{r, eval = F} ourlist[1]
{r, eval = F} ourlist[[1]]
{r, eval = F} ourlist[[1]][[1]]
========================================================
incremental: true
What does this return? ```{r, eval = F} our_df[[“our_variable”]]
our_df[“our_variable”]
our_df$our_variable
* Nothing!
* We're subsetting a variable that doesn't exist
* What is missing to create this variable?
Subsetting in R
========================================================
incremental: true
Three ways of creating a variable:
```{r, eval = F}
our_df[["our_variable"]] <- 1:10
our_df["our_variable"] <- 11:20
our_df$our_variable <- seq(1, 20, 2)
There’s one other way of doing it… Think hard about [] and the , to divide rows and columns
our_df[, "our_variable"] <- "this repeats until end"
incremental: true
Add two variables to the our_df data frame from any of the options above.
TRUE for when age is above or equal to 35.our_df$age and our_df$lgl.Call them whatever you want.
our_df$lgl_two <- our_df$age >= 35
our_df$add <- our_df$age + our_df$lgl
When whe subset we almost always don’t subset like we’ve been doing.
You have all the tools to achieve this, can you tell me how to do this?
Ok, we only want people with ages below 40 years old.
{r, eval = F} age < 40
Everything set!
age is not a variable out there in our environment!our_df$age < 40
c(2, 4, 7, 8, 10) comply with the logical statement.incremental: true
our_df[c(2, 4, 7, 8, 10), ]
our_df[our_df$age < 40, ]
We can subset pretty much anything with logical vectors.
{r, eval = F} gender[gender == "female"] lgl[lgl == TRUE]
Always think about the details!
gender == "female" # is a logical statement
We could’ve written:
gender[c(FALSE, TRUE, TRUE, TRUE, FALSE, FALSE, TRUE, TRUE, TRUE, FALSE)]
But that’s too long.
Let’s move on to functions.
What are functions?
All at the same time!
For example, take the sd function (standard deviation).
class(x)
class(sd)
x
sd
{r, eval = F} sd(x)
returns the standard deviation of a variable
When you have questions about a function type ?function_name
incremental: true
x <- rnorm(100)
y <- x + rnorm(100, mean = 1, sd = 1)
?rnorm does.?cor to calculate the correlation between x and ymethod argument to be “spearman”cor(x, y, method = "spearman")
======================================================== # To be continued….